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Abstract: We describe a fully-vectorial, three-dimensional algo- 
rithm to compute the definite- frequency eigenstates of Maxwell's equa- 
tions in arbitrary periodic dielectric structures, including systems with 
anisotropy (birefringence) or magnetic materials, using preconditioned 
block- iterative eigensolvers in a planewave basis. Favorable scaling with 
the system size and the number of computed bands is exhibited. We 
propose a new effective dielectric tensor for anisotropic structures, and 
demonstrate that 0(Ax 2 ) convergence can be achieved even in sys- 
tems with sharp material discontinuities. We show how it is possible to 
solve for interior eigenvalues, such as localized defect modes, without 
computing the many underlying eigenstates. Preconditioned conjugate- 
gradient Rayleigh-quotient minimization is compared with the David- 
son method for eigensolution, and a number of iteration variants and 
preconditioners are characterized. Our implementation is freely avail- 
able on the Web. 
© 2001 Optical Society of America 
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1 Introduction 

Optical systems have been the subject of enormous practical and theoretical interest 
in recent years, with a corresponding need for mathematical and computational tools. 
One fundamental approach in their analysis is eigenmode decomposition: the possible 
forms of electromagnetic propagation are expressed as a set of definite- frequency (time- 
harmonic) modes. In the absence of nonlinear effects, all optical phenomena can then 
be understood in terms of a superposition of these modes, and many forms of analyt- 
ical study are possible once the modes are known. Of special interest are periodic (or 
translationally-symmetric) systems, such as photonic crystals (or waveguides), which 
give rise to many novel and interesting optical effects [1]. Another important basic sys- 
tem is that of resonant cavities, which confine light to a point-like region. There, the 
boundary conditions are, in principle, irrelevant if the mode is sufficiently confined — so 
they can be treated under the rubric of periodic structures as well via the "super- 
cell" technique. In this paper, we describe a fully-vectorial, three-dimensional method 
for computing general eigenmodes of arbitrary periodic dielectric systems, including 
anisotropy, based on the preconditioned block-iterative solution of Maxwell's equations 
in a planewave basis. A new effective dielectric tensor for anisotropic systems is intro- 
duced, and we also describe a technique for computing eigenvalues in the interior of 
the spectrum (e.g. defect modes) without computing the underlying bands. We present 
comparisons of different iterative solution schemes, preconditioners, and other aspects of 
frequency-domain calculations. The result of this work is available as a free and flexible 
computer program downloadable from the Web [2]. 

There are a few common approaches to eigen- decomposition of electromagnetic sys- 
tems. One, which we employ in this paper, is to expand the fields as definite-frequency 
states in some truncation of a complete basis (e.g. planewaves with a finite cutoff) and 
then solve the resulting linear eigenproblem. Such methods have seen widespread use, 
with many variations distinguished by the critical choices of basis and eigensolver al- 
gorithm [3-17]. This "frequency-domain" method is discussed in further detail below. 
Another common technique involves the direct simulation of Maxwell's equations over 
time on a discrete grid by finite-difference time-domain (FDTD) algorithms [18] — one 
Fourier-transforms the time- varying response of the system to some input, and the 
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peaks in the resulting spectrum correspond to the eigenfrequencies [19-24]. (Care must 
be taken to ensure that the source is not accidentally near-orthogonal to an eigenmode.) 
This has the unique (and sometimes desirable) feature of finding the eigenfrequencies 
only — to compute the associated fields, one re-runs the simulation with a narrow-band 
filter for each state. (Time-domain calculations can also address problems of a dynamic 
nature, such as transmission processes, that are not so amenable to eigenmethods.) A 
third class of techniques are referred to as "transfer-matrix" methods: at a fixed fre- 
quency, one computes the transfer matrix relating field amplitudes at one end of a 
unit cell with those at the other end (via finite-difference, analytical, or other meth- 
ods) [25-30]. This yields the transmission spectrum directly, and mode wavevectors via 
the eigenvalues of the matrix; in some sense, this is a hybrid of time- and frequency- 
domain. Transfer-matrix methods may be especially attractive when the structure is 
decomposable into a few more-easily soluable components, and also for other cases such 
as frequency-dependent dielectrics. 

In any method, the computation is characterized by some number N of degrees of 
freedom (e.g., the number of planewaves or grid points), and one might be interested 
to compare how the number of operations (the "complexity" ) in each algorithm scales 
with N. Unfortunately, there is no simple answer. As we shall see below, the complexity 
in frequency-domain is 0{i c pN log N) + 0(i c p 2 N) for a planewave basis, where p < N 
is the number of desired eigenmodes and i c is the number of iterations for the eigen^ 
solver to converge. In time-domain, the complexity is 0(i t N) to find the frequencies, 
and 0(pi t N) to also solve for the fields of p modes, where i t is the number of time steps. 
(Transfer-matrix methods have too many variations to consider here.) The difficulty in 
both cases comes from the number of iterations, which scales in different ways depend- 
ing upon how the problem size is increased. In frequency-domain, i c is hard to predict, 
but we shall show below that it often grows only very slowly with p and N, so we 
can treat it as approximately constant (often < 20). In time-domain, however, i t must 
increase linearly with the spatial resolution (a certain kind of N increase) to maintain 
stability [18]. It also increases inversely with the required frequency resolution, by the 
uncertainty principle of the Fourier transform: it At A / ~ 1 (where At is the timestep). 
Not only, then, is i t a large number (typically > 1000), but it must also increase dra- 
matically to resolve closely-spaced modes (although this can be ameliorated somewhat 
by sophisticated signal processing [31]); in contrast, frequency-domain methods have no 
special problem resolving even degenerate modes. One traditional advantage of time- 
domain has been its ability to extract modes in the interior of the spectrum without 
computing the lower- frequency states, but we will show that this is feasible in frequency- 
domain as well. We feel that both time- and frequency-domain methods remain useful 
tools to extract eigenmodes from many structures. 

There is sometimes concern that discontinuities in the dielectric function may cause 
poor convergence in a planewave basis. As is described in Sec. 2.3, however, this can be 
alleviated by the use of smoothed effective dielectric tensor [5] , and we demonstrate that 
convergence proportional to the square of the spatial resolution, Ax 2 , can be achieved 
even for sharply discontinuous anisotropic dielectric structures. 

In the paper that follows, we describe in greater detail our method for obtaining the 
eigenmodes of Maxwell's equations, dividing the discussion into two parts: Maxwell's 
equations and eigensolvers. First, we review how Maxwell's equations can be cast as 
an eigenproblem for the frequencies, discuss the choice of basis and the computation 
of an effective dielectric tensor, and consider the critical selection of an approximate 
preconditioner. Second, we describe various block-iterative algorithms for solving this 
eigensystem; in principle this is independent of the equations being solved, but in prac- 
tice there are a number of specific considerations. Throughout, we illustrate the methods 
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being compared with numerical results for example systems. 
2 The Maxwell Eigenproblem 

We first express Maxwell's equations as a linear eigenproblem, abstracting where pos- 
sible from the differential equations in the individual field components to a higher-level 
view that better illuminates the overall process. To this end, we employ the Dirac no- 
tation of abstract operators A and states \H) to provide a representation-independent 
expression for the fields and inner products, and later use ordinary matrix notation 
to indicate the transition to a finite problem. (The underlying equations remain fully 
vectorial.) This eigenproblem formulation has appeared in various forms elsewhere, and 
we begin by reviewing it here. The source-free Maxwell's equations for a linear dielectric 
e = e (f) can be written in terms of only the magnetic field \H) [1]: 

Vxivx|ff) = -i|^|tf>, (1) 

V • \H) = 0. (2) 

We consider only states with definite frequency i.e. a time-dependence e" twt . 
Furthermore, we suppose that the system is periodic — in that case, Bloch's theorem for 
periodic eigenproblems says that the states can be chosen to be of the form [32]: 

\H) = e^*-^ \H n ) , (3) 

where k is the "Bloch wavevector" and \H%) is a periodic field (completely defined by 
its values in the unit cell). Eq. (1) then becomes the linear eigenproblem in the unit 
cell: 

A s \H s ) = (u,/c) 2 \H s ), (4) 
where A% is the positive semi-definite Hermitian operator: 

Ag = (V + ik^j x^V + ifc)x. (5) 

All the familiar theorems of Hermitian eigenproblems apply. Because \H%) has com- 
pact support, the solutions are a discrete sequence of eigenfrequencies u; n (k) forming a 
continuous "band structure" (or "dispersion relation") as a function of k. These discrete 
bands (modes as a function of k) provide a complete picture of all possible electromag- 
netic states of the system, but typically one is interested in only the lowest few. (For 
example, in a photonic crystal it is possible for there to be a band gap in the lower 
bands: a range of u in which no states exist [1].) Furthermore, the modes at a given k 
may be chosen to be orthonormal: 

(h™\h™) = 6 n<m , (6) 

where <S n>m is the Kronecker delta. 

2.1 The choice of basis 

In frequency-domain methods, Eq. (4) is transformed into a finite problem by expanding 
the states in some truncated (possibly vectorial) basis {|i> m )}: 

N 

\Hg) S £ hm \b m ) . (7) 
m=l 
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This expression becomes exact as the number N of basis functions goes to infinity, 
assuming a complete basis. One then has the ordinary generalized eigenproblem 

Ah = (^) Bh, (8) 

where h is a column vector of the basis coefficients /i m , and A and B are N x N 
matrices with entries A e m = (be\A^\b m ) and B im = (be\b m ). It is important to note that 
Eq. (8) by itself is incomplete, however — the modes must also satisfy the "transversality" 
constraint of Eq. (2); zero- frequency spurious modes are otherwise introduced, as can 
be seen by taking the divergence of both sides of Eq. (1). 

In principle, one could compute the entries of A and B and then use a standard 
matrix algorithm to solve Eq. (8) directly, and this method is sometimes employed [3,4,6, 
7,10]. Such a computation, however, requires 0(N 2 ) storage for the matrices and 0(N 3 ) 
work for the diagonalization, making it impractical for large problems. Fortunately, since 
only p bands are typically desired, for some p iV, iterative methods are available to 
compute the bands with only 0{pN) storage and roughly 0(p 2 N) work; these methods 
are the subject of Sec. 3. The relevant property of iterative methods is this: they require 
only a fast, ideally O(N), method to compute the products Ah and Bh, with no need 
to store the matrices explicitly. 

The choice of basis functions |6 n ), then, is determined by three factors. First, they 
should form an compact representation so that a reasonable N yields good accuracy. 
Second, a convenient and efficient method for computing Ah and Bh must be available. 
Third, they should be inherently transverse, or otherwise provide an inexpensive way 
to maintain the constraint of Eq. (2). 

2.1.1 The planewave basis 

We chose to use a planewave basis (for reasons described below) [3-7], in which |6 m ) = 
e iG m x f or some reciprocal-lattice vectors G m ; the truncation N is determined by choos- 
ing a maximum cutoff for the magnitude of G m . Strictly speaking, a cutoff magnitude 
would result in a spherical volume of G vectors, but we expand this into a parallelop- 
iped volume so that the transformation between planewave and spatial representations 
takes the convenient form of a Discrete Fourier Transform (DFT). (Such an extension 
also removes an ambiguity of the order in which to invert and Fourier- transform e [5].) 
The planewave set then has a duality with a spatial grid, which is often a more in- 
tuitive representation. In particular, suppose that the three primitive lattice vectors 
(the units of periodicity) are ^R\,R2,Rz^ and the primitive reciprocal-lattice vec- 
tors are jGi,G2,C?3 j, defined by Ri • Gj = 2^5^ [32]. Then the basis functions are 

\b mu m 2i m 3 ) = e* with m j = - \Nj/2] + 1, . • • , L^/2], 1 N = NiN 2 Ns, and 

Eq. (7) for the spatial field becomes: 

V k ) { mj } {m } } 

(9) 

Here, = 0, • • • , Nk - 1 describe spatial coordinates on an N\xN 2 x jV 3 affine grid along 
the lattice directions. This is precisely a three-dimensional DFT, and can be computed 
by an efficient Fast Fourier Transform (FFT) algorithm [33] in O (NlogN) time. 

1 This is equivalent to mj = 0, • • • , Nj — 1 for the DFT, in which rrij is interpreted modulo Nj, but 
choosing zero-centered wavevectors is important when taking derivatives of the basis. 
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Thus, in a planewave representation, the product Ah from Eq. (8) can be computed 
in 0(N log N) time by taking the curl in wavevector space (just the cross-product with 
k + G m ), computing the FFT, multiplying by computing the inverse FFT, and 
taking the curl again [5]: 

A im = - (k + Ge) x ...IFFT.-.pi...FFT...(£ + G m ) x . (10) 

The determination of an effective inverse dielectric tensor, e" 1 , is discussed in Sec. 2.3. 
The matrix B is simply the identity, thanks to the orthonormality of the basis. 

Since the basis functions themselves are scalars in this case, the amplitudes h m in 
Eq. (7) must be vectors. In addition to Eq. (8), the field must satisfy the transversality 
constraint, and it is here that a key advantage of the planewave basis becomes apparent: 
Eq. (2) becomes merely a local constraint on the amplitudes, h m • (fc + G m ) =0. For 

each reciprocal vector (5 m one chooses a pair {u m ,i> m } of orthonormal unit vectors 
that are perpendicular to fc + G m , and writes the amplitude as 

Then the basis is intrinsically transverse, and one can treat Eq. (8) as an ordinary 
eigenproblem of rank n = 2N without worrying about any constraint. 

2.1.2 Other possible bases 

The planewave basis has at least two potential disadvantages: first, it corresponds to a 
uniform spatial grid, and may thus be a less economical representation than one based 
on e.g. a general mesh; second, the computation of the Maxwell operator A requires 
0(N log N) time instead of O(iV), although the difference may be small in practice. 
Both of these problems could be overcome by using a different basis — for example, a 
traditional finite-element basis formed of localized functions on an unstructured mesh. 
Such a basis, however, would make it more difficult to maintain the transversality con- 
straint, which is why we eschew it in our implementation. (One way around this might 
be to replace the magnetic field with the vector potential, H = V x A, although this 
introduces higher-order derivatives into the eigenproblem.) Alternatively, it is possible 
to solve the eigenproblem without transversality and a posteriori identify and remove 
the resulting spurious modes (which lie at zero frequency unless a "penalty" term is 
added to the eigen-equation) [8, 14]. 

In two dimensions, though, transversality ceases to be a problem: one simply chooses 
the magnetic field along z (TE fields) or the electric field along z (TM fields, for which 
the eigenproblem could be recast in E or D). This fact has been employed by various 
researchers to implement finite-element or other-basis frequency-domain methods in 
2D [8-14]. (Our implementation also supports the 2D TE/TM case, of course.) 

Given the eigenmodes for the primitive cell of a lattice, it may be possible to use them 
to construct a localized Wannier-function representation that is a useful basis for defect- 
mode calculations [15,16], although the non-uniqueness of the Wannier functions must 
be resolved, e.g., by fitting to the precomputed band structure in "tight-binding" fashion 
[16]. Such a basis is automatically divergenceless since the constituent eigenmodes are 
transverse. Another possibility is a tight-binding basis that is not specified explicitly, 
but whose matrix elements are fitted to an existing band diagram [17]. 

2.2 Inversion symmetry 

In general, the basis coefficients h are complex, but additional simplifications are pos- 
sible when the dielectric function possesses inversion symmetry: e(— x) = s(x). The 
Fourier transform of a real and even function is real and even, so it follows that the 
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Fig. 1. Eigenvalue convergence as a function of grid resolution (grid points per 
lattice constant a) for three different methods of determining an effective dielectric 
tensor at each point: no averaging, simply taking the dielectric constant at each grid 
point; averaging, the smoothed effective dielectric tensor of Eq. (12); and backwards 
averaging, the same smoothed dielectric but with the averaging methods of the two 
polarizations reversed. 

planewave representation of Eq. (10), is then a real-symmetric matrix. This means 
that the planewave amplitudes /i m can be chosen to be purely real, resulting in a factor 
of two savings in storage, more than a factor of two in time (due to more-efficient FFTs 
and matrix operations for real data), and possibly a reduction in the number of eigen- 
solver iterations (due to the reduced degrees of freedom). The spatial fields, in contrast, 
cannot generally be chosen as real — rather, with inversion symmetry they may be chosen 
to satisfy the property of the Fourier transform of a real function: H% (x) = H% (—£)*. 
Because inversion symmetry is extremely common in practical structures of interest, 
our implementation supports this optimization when the symmetry is present. For gen- 
erality, however, we also handle the case of complex h for non-symmetric systems. 

2.3 The effective dielectric tensor 

When computing the product A% m a planewave basis, the multiplication by e -1 is 
done in the spatial domain after a Fourier transform, so one might simply use the inverse 
of the actual dielectric constant at that point. Unfortunately, this can lead to suboptimal 
convergence of the frequencies as a function of AT, due to the problems of representing 
discontinuities in a Fourier basis. It has been shown, however, that using a smoothed, 
effective dielectric tensor near dielectric interfaces can circumvent these problems, and 
achieve accurate results for moderate N [5]. In particular, near a dielectric interface, one 
must average the dielectric in two different ways according to effective-medium theory, 
depending upon the_polarization of the incident light relative to the surface normal n. 
For an electric field E \\ ft, one averages the inverse of e; for E _L n, one takes the inverse 
of the average of e. This results in an effective inverse dielectric tensor e~ l : 

pi = FTp + £-1(1 _P) (11) 
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where P is the projection matrix onto h: Pij — riiUj. Here, the averaging is done over 
one voxel (cubic grid unit) around the given spatial point; if e is constant, Eq. (11) 
reduces simply to e -1 . (The original formulation was in terms of e, but is equivalent.) 

We have generalized this procedure to handle the case of anisotropic (birefringent) 
dielectric materials, in which case the intrinsic e is already a real-symmetric tensor. Or, 
more generally, for magnetic materials e may be complex-Hermitian. In this case, an 
analogous equation is needed that: (i) produces a Hermitian effective inverse dielectric, 
(ii) reduces to Eq. (11) for the case of scalar e, (iii) yields e~ x for constant e, (iv) 
retains the physical justification of the different averaging methods for the different 
polarizations, and (v) similarly improves convergence. The expression we propose that 
satisfies these criteria is: 

F i = \({^,p} + {e-\(l-P)}), (12) 

where e may be a tensor and {a, b} denotes the anti-commutator ab + ba. Our first three 
conditions are manifestly satisfied. A physical justification of this formula is that a given 
averaging method should be used when either the field or the inverse e times the field is 
in the appropriate direction relative to n, hence the anti-commutators. To illustrate the 
convergence impact of Eq. (12), we consider a simple example case similar to that of [7]: 
an fee lattice (lattice constant a) of close-packed spherical holes in dielectric (e = 12), 
where the holes are filled by an anisotropic "liquid crystal" material with an e of 2 
for fields along an "extraordinary" Oil (x) direction and 1 otherwise. We compute the 
frequency of the ninth band (just below the gap for air holes) at the L point as a function 
of grid resolution for three cases: no e averaging, with averaging as in Eq. (12), and with 
averaging backwards from Eq. (12) (P switched with 1 -P). The results, shown in Fig. 1, 
exhibit a significant acceleration of convergence by the averaging; conversely, the poor 
convergence of the backwards averaging demonstrates the importance of polarization for 
the smoothing method. With the averaging of Eq. (12), we see that the error decreases 
with the square of the resolution, just as for standard FDTD methods [18]. 

As a practical matter, it can be cumbersome to compute (or even to define) the 
surface normal n for complicated three-dimensional structures, so we implement an 
approximation. Given a flat dielectric interface, the normal direction is exactly J re 
over a spherical surface intersecting the interface, where r is the vector from the center 
of the sphere. This procedure also yields the correct normal for spherical and cylindrical 
interfaces, by symmetry. Therefore, we use this spherical average to define the normal 
direction in all cases. 2 In order to compute the average, we employ a 50-point 11th- 
degree spherical-quadrature formula [34] to numerically integrate re over a spherical 
surface inscribed within the e averaging voxel. (Testing this method by computing the 
normal vectors to a large number of random planar surfaces, we found a mean angular 
error of about 5°.) 

An interesting unanswered question is whether a similar effective e tensor would 
improve the convergence of FDTD methods, for which scalar averaging methods have 
already been explored to improve modeling of discontinuous material interfaces [35,36]. 

2.4 Preconditioners 

A critical factor in the performance of an iterative eigensolver is the choice of "precon- 
ditioning" operator. As will be explained in Sec. 3, our preconditioner requires us to 
supply an approximate inverse A~* of A such that A~ l h can be computed quickly. This 

2 This method for defining n can produce suboptimal results when the averaging voxel straddles two 
near-parallel dielectric interfaces. Preliminary investigations show that gains of a factor of two or more 
in eigenvalue accuracy are sometimes possible if a better approximation for ft is used in such cases. 
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choice of a "good" preconditioner is highly problem-dependent, and is unfortunately a 
matter of trial and error. We consider two possible preconditioners. The first is a di- 
agonal preconditioner [5,12,37], inspired by the "kinetic energy" preconditioners often 
used in electronic calculations [38]. In this case, A is simply the diagonal elements of A: 



Ae m = fc + G>, 



2 



*€,m, (13) 



where we have dropped an overall scaling factor (irrelevant to a preconditioner). The 
motivation for this approximation is that, for large G components, the operator A^ is 
dominated by the curl operations and not by the variations in e. Computing A~ l h is 
then an O(N) diagonal operation. 

We also consider a more accurate inverse [37]. Ideally, one would like to simply reverse 
each of the steps in computing Ah via the plane wave representation of Eq. (10) to find 
an exact inverse. Note that a cross product of perpendicular vectors is invertible. So, 
the only operation of Eq. (10) that is not trivially reversible is the final cross product 
with k + Gi, since —iu>E = e~ l V x H is not generally transverse (divergenceless). 
Since e is normally piecewise constant and V • eE = 0, however, it is plausible^ that 
E is "mostly" divergenceless. With this in mind, we approximate the operator ^4.^ by 
inserting projection operators Pt onto the transverse field components: 

A = Vx Pt^-PtVx, (14) 

where the rightmost Pt is superfluous and serves only to make the operator clearly 
Hermitian. Both curls are now reversible, since they act on transverse planewaves. The 
inverse of Eq. (14) can be thus applied in 0(N log N) time in a manner exactly analogous 
to Eq. (10): invert the k + Ge cross product (projecting the result), FFT, multiply by e, 
inverse FFT, and invert the £-f-G m cross product. This preconditioner is more expensive 
to compute than that of Eq. (13), but we shall show that it yields a substantial speedup 
in the eigensolver. 

2.4.1 Removing the singularity 

There is one evident obstacle in this preconditioning process — at k = 0 (the T" point), 
the operator Ag is singular. (This will also be a problem for iterative methods such 
as conjugate-gradient, which are known to converge poorly for ill-conditioned matrices 
[46].) This difficulty is easily overcome, however, because the singular solutions are 
known analytically: they are the constant-field solutions at lj = 0. So, at T we simply 
remove these singular solutions from the basis space (only considering planewaves with 
G m ^ 0), and re-insert them after we have solved for the non-singular eigenvectors. 



3 Iterative Eigensolvers 

Iterative eigensolvers are fast methods for computing the p <^n smallest (or largest, or 
extremal) eigenvalues and eigenvectors of an n x n generalized eigenproblem Ay = XBy, 
by iteratively improving guesses for the eigenvectors until convergence is achieved. (B 
is the identity in our case, but we consider the full problem here for generality.) They 
are thus ideally suited for the problem of finding the few lowest eigenstates of Maxwell's 
equations. Many iterative eigensolver algorithms have been developed, but we focus our 
investigations on two in particular: preconditioned conjugate-gradient minimization of 
the block Rayleigh quotient [5,38-40], and the Davidson method [41-43] (an extension of 
Lanczos's algorithm [44]). We choose these two because they are able to take advantage 
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of the good preconditioned available for this problem, and because they are Krylov- 
subspace methods — at each step, they compute the best solution in the subspace of all 
previously tried directions (with some approximations, described below) [39,45]. 

As a test case for the convergence of the algorithms discussed below, we use a 3d 
diamond lattice of dielectric spheres (e = 12) in air, which has a gap between its second 
and third bands [3]. Except where otherwise noted, we solve for the first five bands 
at the L wavevector point with a 16 x 16 x 16 grid resolution in the (affine) primitive 
cell (~ 22.6 grid points per lattice constant), tracking the error in the eigenvalue trace 
versus the converged value, with the non-diagonal preconditioner of Eq. (14). A set 
of five pseudo-random fields are used as the starting point for the eigensolver, and we 
report the results from the case that converges in the median number of iterations. Note 
that the number of iterations in practice will often be less than that reported here — not 
only is the minimum solution tolerance usually bigger, but one typically starts with the 
fields from a nearby k point (yielding a significant "head start" vs. random fields). 

3.1 Conjugate- gradient minimization of the Rayleigh quotient 

The smallest eigenvalue Ao and the corresponding eigenvector yo of a Hermitian matrix 
A are well-known to satisfy a variational problem— they minimize the expression: 

Ao=min 4^o, (15) 

known as the "Rayleigh quotient," where t denotes the Hermitian adjoint (conjugate- 
transpose). One can then compute this eigenpair by performing an unconstrained min- 
imization of the Rayleigh quotient, using a method such as conjugate-gradient [46], 
To find the subsequent eigenvalue and eigenvector, the minimization is repeated while 
maintaining orthogonality to yo (through B), and so on; a process known as "defla- 
tion." As in all iterative methods, the matrix A need never be computed explicitly, only 
the product Ay (and By). This method [39] for computing the smallest eigenpairs of 
a matrix, also called a "Rayleigh-Ritz" algorithm, was employed by [5] to solve for the 
eigenstates of MaxwelPs equations in a planewave basis. 

3.1.1 The block Rayleigh quotient 

We use an extension of the conjugate-gradient Rayleigh- quotient method, adapted from 
[40], that solves for all of the eigenvectors at once instead of one-by-one with deflation. 
Such a process is called a "block" algorithm, and it has two advantages in our case: 
first, it can take advantage of efficient block matrix operations that have superior cache 
utilization on modern computers [47,48]; second, on parallel machines, block algorithms 
can communicate in larger chunks that mask latencies. Actually, as described in Sec. 3.4, 
we employ a hybrid of block algorithms and deflation, but such a generalization can only 
be made if one has a block method to begin with. 

Let Y be an n x p matrix whose columns are the p eigenvectors with the small- 
est eigenvalues. Then Y minimizes the trace tr [Y^AY] subject to the orthonormality 
constraint Y^BY = I (where J is the identity matrix) [50]. Although it is possible to 
perform such a minimization directly by observing the differential geometry of the con- 
straint manifold [49], it is more convenient to introduce a rescaling, similar to that of 
the Rayleigh quotient, that makes the problem unconstrained. 3 If we write: 

Y = Z (&BZ)~ l/2 (16) 

3 We also tried using the differential-geometry conjugate-gradient algorithm of [49]. In addition to 
requiring more matrix operations per iteration than the method described here, it also seemed to work 
much more poorly with our choice of preconditioners, for unknown reasons. 
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for an arbitrary non-singular n x p matrix Z, then the orthonormality constraint is 
automatically satisfied — this is known as the "polar" or "symmetric" orthogonalization 
of Z, and happens to be optimal from a certain numerical-stability standpoint [51]. 
More importantly, we can now solve for the eigenvectors by performing an unconstrained 
minimization of the block Rayleigh quotient: 

tr [Z^AZU] , (17) 

where U = (Z^BZ)~ X . Once the minimization is completed, the eigenvectors are a 
superposition of the columns of Z, and can be found by diagonalizing the px p matrix 
Y*AY where Y is given by Eq. (16). 



3.1.2 Conjugate gradient 

The conjugate-gradient algorithm is a method for minimizing multidimensional func- 
tions by searching along successive "conjugate" directions, chosen so that each line 
minimization does not spoil the previous ones [46]. Strictly speaking, the algorithm is 
designed only for quadratic functions, for which it is a Krylov-subspace method. Even 
non-quadratic forms such as Eq. (17), however, are approximately quadratic near their 
minima and so conjugate-gradient is beneficial. The minimization direction is usually 
chosen starting from the steepest-descent (gradient) direction G, which in this case is 
given by [40]: 

G = P L AZU, (18) 

where P± is the projection operator onto the space orthonormal to Z: P± = l—BZUZ^. 
The conjugate minimization direction D is then: 

D = KG + jD 0 , (19) 
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where K is a preconditioning operation (discussed below), Do is the minimization di- 
rection from the previous step, and 7 is either: 

tr \&KG] 

7 = r L t . \ (20) 
tr [gIkGo\ 

in the Fletcher-Reeves variant or: 

tr UG-Go) 1 KG] 
7= 1 r f . i (21) 
tr [GIkGo\ 

in the Polak-Ribiere variant, where Go is the gradient direction from the previous step. 
Both variants involve approximations for the derivative of G [49], with Polak-Ribiere 
being slightly more accurate at the expense of requiring extra storage space for Go. If we 
instead choose 7 = 0, the result is simply the preconditioned steepest-descent algorithm. 
The relative convergence of these variations is depicted in Fig. 2, and displays a sharp 
acceleration for conjugate-gradient once the error becomes small, presumably due to 
the function becoming approximately quadratic. (Our subsequent convergence plots use 
Polak-Ribiere.) 

Once the conjugate direction is determined, one needs to minimize Eq. (17) over Z' = 
Z+aD. By substituting Z' into Eq. (17), one finds a function in terms of a and constant 
p x p matrices, which we then minimize via a one-dimensional optimization algorithm 
specifically designed for use with conjugate-gradient-like methods [52]. Alternatively, one 
could make a two-point approximation for the second derivative of the function along 
the line and fit to a quadratic (i.e., apply one step of Newton's method) [38,40]. This 
method requires two fewer 0{p 2 n) matrix multiplications (out of eight per iteration), 
but often produces somewhat slower and less reliable convergence. 

3.1.3 Preconditioning 

Large gains in the rate of convergence can be achieved by choosing a proper Hermitian 
preconditioning operator K in Eq. (19) — ideally, an approximate inverse of the Hessian 
(second-derivative) matrix of the objective functional, Eq. (17). 4 One can think of this 
as an application of the multi-dimensional Newton's method to find the zero of the 
gradient G: the Newton update KG of the solution guess Z is the function G divided 
by its derivative. Equivalently, if we are at 

Z = Z* + SZ y (22) 

where Z* is the unknown minimum, we wish to solve for 5Z = KG to lowest order. 
Thus, we substitute Eq. (22) into Eq. (18) for the gradient and expand to first order in 
8Z, noting that G* = 0, and find [53]: 

G = P± (ASZ - BSZUZ^AZ) U, (23) 

Suppose that Z were rotated to diagonalize the generalized eigenproblem Z^AZx = 
Z^BZxX. Then, the second term in Eq. (23) would be simply B5Z times the current 
eigenvalue approximations. Inverting Eq. (23) thus involves the inversion of A — B A. At 
this point, however, we make an additional approximation: since the desired eigenvalues 
A are small, we neglect that term and use: 

G = P±A6ZU. (24) 



4 In the literature, "preconditioner" sometimes refers instead to K~ l , the approximate Hessian. 
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Fig. 3. The effect of two preconditioning schemes from section 2.4, diagonal and 
transverse-projection (non-diagonal), on the conjugate-gradient method. 

An inverse of this equation is given by the Hermitian operation 5Z = A~ x GU~ l , which 
is then further approximated by: 

8Z^kG = A ri GU~\ (25) 

where we have used an approximate inverse for A such as those in Sec. 2.4. The effects 
of the preconditioner of Eq. (25) with our two approximate inverses are demonstrated 
in Fig. 3, showing significant benefit from the non-diagonal preconditioner. 

Additional refinements are possible that we do not consider here. One could solve 
Eq. (23) exactly by an iterative method, or at least a few iterations thereof to improve 
the preconditioner. Also, because of the projection operator Pjl, one has a choice of 
how to invert Eqs. (23, 24) — it has been suggested in a similar context that one should 
choose 5Z in the space orthogonal to Z [43]. This can be done at the expense of a few 
extra matrix operations, but it did not seem to improve convergence significantly in our 
case. Such a choice would become more important, however, if one attempted a more 
accurate preconditioner that inverts expressions of the form A — BX, lest singularities 
arise. 

3.2 The Davidson method 

In the Davidson method, instead of iteratively improving a single (block) eigenvector 
approximation Y, one builds up an increasing subspace V that eventually contains the 
desired eigenvectors [42]. While both the Davidson and conjugate-gradient methods axe 
Krylov-subspace algorithms, this is only exactly true of conjugate-gradient for pure 
quadratic forms, and of Davidson in the absence of restarting (described below). We 
summarize the method briefly in the following. 

At each iteration, V is an n x q matrix for some q > p, whose columns span the current 
subspace, with V* BV = I. The task is to find p new vectors to add to V so that it spans 
a better approximation for the desired p eigenvectors. To do this, one first computes the 
best p eigenvectors Y so far: A is projected to the V subspace, A v = V^AV, and the 
associated eigenvectors are computed for this q x. q matrix; of these eigenvectors, one 
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Fig. 4. Comparison of the Davidson method with the block conjugate-gradient algo- 
rithm of section 3.1. We reset the Davidson subspace to the best current eigenvectors 
every 2, 3, 4, or 5 iterations, with a corresponding in increase in memory usage and 
computational costs. 

chooses the block of the p smallest eigenvalues, L = diag{\\ • • • A p ), and their associated 
q x p eigenvectors Y v , and computes the "Ritz" eigenvectors Y = VY V . The residual (i.e., 
error) for this eigenvalue approximation is R = BYL — AY. The new directions to be 
added to V are then given by D = KR (orthogonalized, to maintain the orthonormality 
of V) y where K is a preconditioning operator. 

To keep the subspace limited to a reasonable size, this process is usually restarted 
every few iterations by setting V = V, with some tradeoff in speed of convergence. 
(Alternatively, one could keep the r lowest eigenvectors for some r > p [43].) Restarting 
in this way on every iteration (never increasing the subspace) is a form of steepest- 
descent; a method of this sort was used in [12]. The ideal preconditioner K for the 
Davidson method, remarkably, has been shown to be the same as that of Eq. (23) 
for trace-minimization, combined with the fact that Y in this case is an orthonormal 
solution to the projected eigenproblem. That is, KR should be an approximate solution 
for D in [43]: 

R= P x (AD - BDL) , (26) 

where L is the diagonal matrix of eigenvalues from above and Fx is the projection matrix 
1 - BYY* as in Sec. 3.1.2. Again, D should ideally be found in the space orthonormal 
to Y [43]; this is the "Jacobi-Davidson" algorithm. To solve Eq. (26), we make the 
same approximations as in Sec. 3.1.3 and use KR = A~ l R. The Davidson method 
is also adaptable to non-Hermitian operators, and it has thus been employed to treat 
electromagnetic problems containing absorption [14]. 

The results of a comparison of the Davidson method described here with the block 
conjugate-gradient minimization of the Rayleigh quotient from the previous section are 
shown in Fig. 4. Various restarting frequencies are used, with more infrequent restarts 
resulting in faster convergence but also higher memory requirements and a greater 
computational cost per iteration (e.g. more than twice as much memory, and 1.6 times as 
many floating-point operations for eigenvector matrix products, in our period-5 restarted 
Davidson compared to conjugate-gradient). Overall, it does not seem to match the 
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Fig. 5. Conjugate- gradient convergence of the lowest TM eigenvalue for the "inte- 
rior" eigensolver of Eq. (27), solving for the monopole defect state formed by one 
vacancy in a 2D square lattice of dielectric rods in air, using three different supercell 
sizes (3 x 3, 5 X 5, and 7 x 7). 

performance of the conjugate-gradient method in our case. Nevertheless, we believe it is 
important that future studies continue to investigate the Davidson method with different 
preconditioners and restarting schemes, especially since it seems to exhibit more regular 
convergence than the conjugate-gradient algorithm. 



3.3 Interior eigenvalues 

One of the most interesting aspects of photonic crystals is the possibility of localized 
modes associated with point defects (cavities) and line defects (waveguides) in the crys- 
tal [1,54]. In this case, the desired modes lie in a known frequency range (the band gap) 
in the interior of the spectrum. Moreover, since a supercell of some sort must be used to 
surround the defect with sufficient bulk crystal, the underlying non-localized modes are 
"folded" many times (increasing their number by the number of primitive cells in the 
supercell). Ideally, one would like to compute only the defect modes in the band gap, 
without the waste of computation and memory of finding all the folded modes below 
them. One way to accomplish this is to use the FDTD algorithms referenced in Sec. 1. 
Here, we demonstrate that it is also possible to compute only the interior eigenvalues 
using iterative frequency-domain methods. 

The center of the band gap is known, so we can state the problem as one of finding 
the eigenvalues and eigenvectors closest to the mid-gap frequency u) m . Since the methods 
presented above compute the minimum eigenvalues of an operator, we can apply them 
directly merely by shifting the spectrum [55]: 

4=(V^) 2 - (27) 

This operator has the same eigenvectors as but its lowest eigenvalues are the ones 
closest to cj m , and thus we can compute a single defect state without computing any 
other eigenstates. As a preconditioner for this operator, we again approximate a/ m as 
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small and simply use KG = A- 2 GU~ l , essentially the square of Eq. (25). This imper- 
fect preconditioner and the worsened condition number of Eq. (27) will lead to slower 
eigensolver convergence, but this is generally more than compensated for by the re- 
duction in the number of bands computed (and the resulting savings in memory and 
time per iteration). To demonstrate this "interior" eigensolver, we consider the case of a 
two-dimensional square lattice of dielectric rods in air, with a single rod removed from 
the crystal. Such a defect supports one confined TM mode in the gap [54], and in Fig. 5 
we plot the eigensolver convergence of the lowest TM eigenvalue of Eq. (27), for 3 x 3 to 
7x7 supercells with a resolution of 16 grid points per lattice constant. (To find the same 
state in a 7 x 7 supercell via the ordinary method, p — 49 bands must be computed.) 

An alternative method for computing interior eigenvalues using a modified version 
of the Davidson method has also been suggested [43] and we intend to investigate 
this algorithm in future work, along with more accurate preconditioners (e.g. based on 
iterative linear solvers). 

3.4 To block or not to block? 

We have described the use of block eigensolvers to solve for all of the desired eigenstates 
simultaneously. The classical algorithm of computing them one by one with deflation 
has its advantages, however. First, it uses less memory — 0(pn) memory is required to 
store the eigenstates themselves, but only 0(n) memory is required for auxiliary matri- 
ces such as G and £>, versus 0(pn) for the block method. (At least two such auxiliary 
matrices seem to be required for conjugate gradient, although three is more convenient, 
and one more for the Polak-Ribiere method.) Second, because our preconditioner is a 
better approximation for the lower bands (where lj is smallest), the eigensolver con- 
vergence of the upper bands is slower — it is more efficient to compute them separately. 
Third, the eigensolver iterations themselves can require fewer operations for the defla- 
tion algorithm, as described below. 

Since blocked algorithms maintain inherent cache-reuse advantages on modern com- 
puter architectures [47], however, it makes sense to consider a hybrid approach: we use 
the blocked eigensolver to solve for b bands at a time, with deflation to orthogonalize 
against the previous bands. The optimal choice of b depends upon the particular prob- 
lem size considered and hardware/software details, but it typically ranges between five 
and fifteen bands. (As long as 6 is several times smaller than p, the memory advantages 
of deflation are realized.) Often, a factor of two or more in speed is gained compared to 
either b = 1 or b = p. 

Let us consider the arithmetic complexity of the iterations. Each eigensolver iter- 
ation for b bands requires 0(£nb 2 ) operations for I matrix operations such as Zt/, 
and 0(bn log n) operations for A in a planewave basis. Furthermore, orthogonalization 
against the q previous bands requires 0(2nbq) operations (once or twice per iteration, 
to maintain numerical stability). Assuming that all bands require the same number of 
iterations, this must be repeated p/b times. Thus, each eigensolver iteration requires a 
total of 0(£npb) + 0(pn log n) operations for the bands themselves, plus 0(np(p — b)) 
operations for the orthogonalization. So, if t is roughly greater than the number of or- 
thogonalizations per iteration (£ ~ 8 in our implementation), then deflation requires 
fewer operations per iteration than computing all of the bands at once. Of course, the 
actual running time scales in a more complicated way, due to the cache, CPU pipeline, 
and other issues. 

3.5 Scaling 

One final question that we wish to address is that of scaling: how does the eigensolver 
convergence rate scale with the size of the problem? Specifically, we consider the conver- 
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Fig. 6. Scaling of the number of conjugate-gradient iterations required for conver- 
gence (to a fractional tolerance of 10~ 7 ) as a function of the spatial resolution (in 
grid points per lattice constant, with a corresponding planewave cutoff), or the 
number p of bands computed. 

gence of the block conjugate-gradient algorithm for the diamond structure as we increase 
either the resolution or the number of bands. In both cases, we plot the mean number 
of iterations (over five runs with random starting fields) for the eigensolver to converge 
to within a fractional tolerance of 10~ 7 (which is probably lower than the tolerance that 
would be used in practice) . In the case of the increased number of bands, we solve for 
the bands in blocks of b — 5 bands, as described in Sec. 3.4, and use the average number 
of iterations per block. The results, plotted in Fig. 6, demonstrate that the number 
of iterations increases only very slowly as the problem size is increased. (We suspect 
that the slowed convergence for the increased number of bands results largely from the 
worsening of the small-u; approximation in our preconditioner.) In contrast, FDTD al- 
gorithms must scale their number of iterations (or, strictly speaking, At) linearly with 
the spatial resolution in order to maintain stability. 

4 Conclusion 

We have presented efficient preconditioned block-iterative algorithms for computing 
eigenstates of Maxwell's equations for periodic dielectric systems using a planewave 
basis. Such methods, combined with appropriate effective dielectric tensors for accuracy, 
interior eigensolvers to compute only the desired modes in defect systems, and a flexible 
and freely-available implementation, provide an attractive way to perform eigen-analyses 
of diverse electromagnetic systems. 

Acknowledgments 

This work was supported in part by the Materials Research Science and Engineering 
Center program of the National Science Foundation under Grant No. DMR-9400334 and 
by the U. S. Army Research Office under Grant No. DAAG55-97-1-0366. S. G. Johnson 
is grateful for the support of a National Defense Science and Engineering Graduate 
Fellowship and an MIT Karl Taylor Compton Fellowship. 



#27937 -$15.00 US 
(C) 2001 OSA 



Received November 17, 2000; Revised December 12, 2000 
29 January 2001 / Vol. 8, No. 3 / OPTICS EXPRESS 190 



